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In the recent years, probabilistic graphical models have emerged from different disciplines as a 
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| Abstract 

O | We present a new probabilistic modelling framework based on the recent notion of normal 

factor graph (NFG). We show that the proposed NFG models and their transformations unify 
some existing models such as factor graphs, convolutional factor graphs, and cumulative distri- 
bution networks. The two subclasses of the NFG models, namely the constrained and generative 
models, exhibit a duality in their dependence structure. Transformation of NFG models further 
extends the power of this modelling framework. We point out the well-known NFG representa- 
tions of parity and generator realizations of a linear code as generative and constrained models, 
and comment on a more prevailing duality in this context. Finally, we address the algorithmic 
aspect of computing the exterior function of NFGs and the inference problem on NFGs. 

O 

1 Introduction 

> 
O 

o 

. powerful methodology for statistical inference and machine learning. Traditional such models, 

such as Bayesian networks pQ and Markov random fields [2], primarily aim at representing the 
CT^ " joint probability distribution (i.e., probability mass function or probability density function) of the 

random variables (RVs) of interest in terms of their multiplicative factorization structure. Such 
"multiplicative" modelling semantics can be translated to the language of factor graphs (FGs) 
[3], a mathematical and graphical framework that is convenient and intuitive for representing the 
multiplicative factorization of a multivariate function. It is arguable that FGs and their variants, 
such as directed factor graphs [I] , unify the various such multiplicative models OS]. In contrast 
to the multiplicative models, convolutional factor graphs (CFGs) [6] are models which repre- 
sent the joint distribution of interest in terms of convolutional factorizations. The CFG modelling 
framework has recently demonstrated its power in a derivative of the CFG model, known as linear 
characteristic models (LCM) [TJ, for inference in stable distributions. Instead of directly repre- 
senting the distribution of interest, LCM represents the characteristic function of the distribution. 
This is advantageous for stable distributions, which are only explicitly defined in the character- 
istic function domain. We argue that the philosophy of modelling in a "transform domain", as 
manifested in LCM, should not be overlooked. This is because the unique nature of an inference 
task one is faced with may favour a representation of some other objects than the probability 
distribution. Incidentally or not, cumulative distribution functions, which may be viewed as trans- 
formations of probability distributions, appear more favourable in structured ranking problems, 
and this recognition has led to the development of cumulative distribution networks (CDNs) [8j. 
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In this paper, we present a new graphical model, the normal factor graph model, based on the 
notion of normal factor graphs (NFGs) [9l[T0]. In the framework of NFGs, a powerful tool, called 
holographic transformation, has been developed . It was shown in [9] that this tool unifies a duality 
theorem of Forney [H] in coding theory and the Holant theorem of Valiant [12] in complexity 
theory. 

The main objective of this paper is to show that the proposed NFG models, together with 
the holographic transformation technique, essentially unifies all the probabilistic models mentioned 
above. We will focus on two subfamilies of NFG models, namely constrained and generative NFG 
models. We will show that constrained NFG models reduce to FGs, however, have a different 
interpretation, and that generative NFG models, restricted to a special case, reduce to CFGs. In 
addition, we reveal an interesting "duality" between the constrained and generative NFG models 
in their independence properties. A general model transformation technique is introduced, using 
which we show a CDN is equivalent to a transformed NFG model. 

2 Probabilistic Graphical Models 

Here we give a brief summary of the previous graphical models relevant to this work. As a notational 
convention that will be used throughout the paper, a RV is denoted by a capitalized letter, for 
example, by X, Y, . . ., and the value it takes will be denoted by the corresponding lower-cased 
letter, i.e., x, y, . . .. 

2.1 Factor Graphs 

A factor graph (FG) [3] is a bipartite graph (V U U, E) with independent vertex sets V and U, and 
edge set E, where each vertex v G V is associated a variable x v from a finite alphabet X v , and 
each vertex u G U is associated a complex- valued function f u on the cartesian product X ne r u \ := 
Yl X v , where ne(-u) := {v G V : {u, v} G E} is the set of neighbors (adjacent vertices) of u. 

v£ne(u) 

Each function f u is referred to as a local function and the FG is said to represent a function given 
by f(xy) := n«e(7 /«( x ne(u))j where we use the "variable set" notation, defined for any A C V, as 
xa '■= {x a ■ a G A}. In the context of FGs, the function represented by the FG is often called the 
global function. Fig. fj] (a) is an example FG. 




Figure 1: An example of an FG, a CFG, and a CDN: (a) When viewed as an FG, the graph 
represents the global function fi(xx, x%)f2(xi, x^)f^{x\, X3), (b) as a CFG, the graph represents the 
global function /1 (£1,2:2) * f2(xi,xs) * fs(xi,X3), and (c) as a CDN, the graph is understood as 
an FG where each local function is a cumulative distribution, in which case, the global function 
fl(xi,X2)f2(xi,X3)fs(xi,xs) satisfies the properties of a cumulative function, and is taken as the 
joint cumulative distribution of the RVs Xi,X% and A3. 



2 



Since independence (or conditional independence) relationships among RVs are often captured 
via the multiplicative factorization of their joint probability distribution, FGs, when used to rep- 
resent the joint distribution of RVs, form a convenient probabilistic model. 

The relationship between FG probabilistic model and other classical probabilistic models, such 
as Bayesian networks and Markov random fields, is well-known, see, e.g. [3]. In these models, 
all featuring the "multiplicative semantics" and aiming at representing the joint distributions, 
efficient inference algorithms, such as the belief propagation or the sum-product algorithm, have 
been developed and demonstrated great power in various applications. 

2.2 Convolutional Factor Graphs 

Let X\, X2 and X% be three (possibly distinct) finite sets. In general, we require the sets to have an 
"abelian group" structure, so that a notion of addition "+" and its inverse "— " are well defined. 
The requirement that the sets be finite is not particularly critical but only for the convenience of 
argument. 

Let fx and f2 be two function on X\ x X2 and X2 x A3, respectively. The convolution of /1 
and /2, denoted f\ * /2, is defined as the function on X\ x X 2 x X3 given by, (/1 * f2)(xi, X2, £3) := 
Y^xeX* fi( x i> x 2 — x)f2{x, £3). Following the convention in [5], we may write {f\ * f2)(x\,X2, X3) as 
/i(a;i,X2) * 12(^2, X3) to emphasize the domains of the original functions. It is not hard to show 
that the convolution as defined above is both associative and commutative. 

A convolutional factor graph (CFG) [6] is a bipartite graph that represents a global function 
that factors as the convolution of local functions. In fact, the representation semantics in a CFG 
is identical to that in an FG (often referred to as a "multiplicative" FG for distinction), except 
that the above defined notion of convolution is used as the product operation, cf. Fig. Q] (b) for 
an example CFG. In [5 J CFGs were presented as a probabilistic graphical model to represent the 
joint probability distribution of a set of observed RVs that are constructed from a collection of 
independent sets of latent RVs via linear combinations. In addition, the authors of [6] presented 
an elegant duality result between FGs and CFGs via the Fourier transform. Such a duality and 
CFGs have recently been exploited by [7] in what is known as linear characteristic model (LCM) 
for solving inference problems with stable distributions. 

2.3 Cumulative Distribution Networks 

Let X be a RV assuming its values from a finite ordered set X. The cumulative distribution function 
(CDF) of X is defined as Fx(x) := Y^ y <xPx{v)i where px is the probability distribution of X. We 
note that this definition of CDF, as a function on X, is slightly different from the classical definition 
of CDF, which is a function defined on the real line (or on the Euclidean space in the multivariate 
case). It nevertheless captures the same essence and is merely a different representation, suitable 
and convenient in the context of this paper. Such a notion of CDF can be extended to any collection 
of RVs X\ , . . . , X n assuming their values from the finite ordered sets X\ , . . . , X n by defining their 
joint CDF as F Xu ...,x n (xi, . . . , x n ) := E w <xi,...,» n <«„PXi,...,x n (in, • • • ,Vn), where p Xl ,...,x n is the 
joint probability distribution of X%, ■ ■ ■ ,X n . Note that while the marginal probability distribution 
is computed by summing the joint probability distribution over the range of the marginalized RVs, 
the marginal CDF is computed by evaluating the joint CDF at the largest element of Xi, for all 
marginalized RVs Xi. (That is, if I indexes the set of marginalized RVs and Xi takes its values 
from the ordered set {1, . . . , then we evaluate the joint CDF at \Xi\ for all i € I.) It is well 
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known that CDFs satisfy a collection of properties as were articulated in standard textbooks and 
in [8]. On the other hand, any function satisfying such properties, which we shall refer to as "CDF 
axioms" , may be regarded as a CDF and can be used to define a collection of RVs. 

A cumulative distribution network (CDN) [8] is a multiplicative FG in which each local function 
satisfies the CDF axioms, then it is straightforward to show that the global function represented 
by the FG also satisfies the CDF axioms. The global function thus defines a collection of random 
variables, each represented by a variable node in the FG, and the CDN may serve as a probabilistic 
model. In [8], it was shown that CDNs are useful for structured ranking problems, and efficient 
inference algorithms for such problems were developed in these models. See Fig.[T](c) for an example 
CDN. 

3 Normal Factor Graphs 

Now we give a quick overview of the framework of normal factor graphs (NFGs), and develop some 
notations and definitions for subsequent discussions. 

3.1 NFG and the exterior function 

A normal factor graph (NFG) [91 [10] is a graph (V,E), with vertex set V and edge set E, where 
the edge set E consists of two types of edges, a set T of regular edges (also called internal edges), 
each incident on two vertices, and a set L of "half edges" (also called external edges or dangling 
edges), each incident on exactly one vertex. Every edge e € E is associated a variable x e from a 
finite alphabet X e , and every vertex v £ V is associated a local function /„ on the cartesian-product 
Xe(v) := II where E(v) is the set of (internal and external) edges incident on v. At some 

e€E{v) 

places we may use T(v) and L{v) to denote the internal and external edges incident on vertex 
v, respectively, and it is clear that E{v ) = T(v) U L{v ) for all v. We use the symbol Q to refer 
to an NFG, and sometimes write Q(V,E,fy), where fy := {f v : v € V}, to emphasize the NFG 
parameters. 

An NFG Q is associated with a function, called the exterior function and denoted by Zg, on 
the cartesian product Xl := f\ X e , defined as 

Zg{x L ) ■= II fv{x E{v) ). 

That is, the exterior function realized by an NFG is the product of all its local functions with the 
internal variables (edges) summed over. An example NFG is shown in Fig. [2J 

Let V = {1,...,|V|}, at some places for notational convenience, we may denote the right 
hand side of the above equation, a "sum-of-products form," by (/i, /2, . . . , f\v\}- This notation 
is valid due to the distribution law, and the associativity and commutativity of addition and 
multiplication, making the bracketing and ordering of the arguments in the notation (•,••• , •) 
irrelevant. For example, the sum-of-products form encoded by the NFG in Figure [2] may be written 
as {fi(xi,si,s 2 ),f 2 (x 2 ,S2,s 3 ,s 5 ),f 3 (s 3 ,Si),f4,(si,S4,,S5)), or even (fi,f 2 ,h,fa) for simplicity. 

We say that two NFGs are equivalent if they realize the same exterior function. At some places, 
we may extend this notion of equivalence to include other graphical models and say, for instance, 
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Figure 2: An NFG realizing the exterior function fi(%i, s±, S2)/2(^2, S2, S3, Ss)/3(s3, s^jf^si, S4, S5) 

Sl ,...,S5 



"an FG is equivalent to an NFG," where we mean that the product function of the FG is equal to 
the exterior function of the NFG. 

Finally, we call an NFG with no loops or parallel internal edges a simple NFG, and an NFG 
with a bipartite underlying graph (JU J, E) a bipartite NFG, where / and J are the two independent 
vertex sets. We impose no restriction on the cycle structure of NFGs. 

3.2 Special kinds of local functions 

The following (non-disjoint) classes of local functions will be of particular interest. 

Split functions Let X%, . . . , X n be some finite sets, then we say a function / on X\ X • • • X X n is 

a split function via x±, and refer to x\ as the splitting variable, if 

f(xi,.. .,x n ) = f2ixi,x 2 )h{x 1 ,x 3 ) . ..f n (xi,x n ), 

for some bivariate functions f 2 , ■ ■ ■ , f n - Note that it follows immediately that any bivariate function 
is trivially a split function (via any of its arguments). Subsequently, if we do not explicitly specify 
the splitting variable of a split function, then it is assumed to be the function's first argument. 
Graphically, we draw a split function as in Fig. [3] (a) , where the in- ward directed edge is used to 
distinguish the splitting argument, and the remaining arguments are successively encountered in a 
counter clock-wise manner with respect to the directed edge. 

Conditional functions Let X±, . . . , X n be some finite sets, then a function / on X\ x • • • x X n 
is said to be a conditional function of xi given X2,---,x n if there is a constant c such that 
Ylxi f( x i> ■ ■ ■ > x n) = c i f° r au x 2> ■ ■ ■ > x n- It is apparent that a non- negative real conditional function 
with c = 1 is a conditional probability distribution. A conditional function is shown in Fig. [3] (b), 
where we use the same convention of edge labeling as in the case of split functions, but with an 
out-ward directed edge to mark the first argument. 

One may observe a sense of "duality" between a conditional function and a split function through 
the following lemma. 

Lemma 1. Let f(xi,X2,x^) be a positive real split function, then up to a scaling factor, f may be 
written as Px 1 (xi)px 2 \x 1 ( x 2\xi)px 3 \x 1 ( x 3\ x i) f or some probability distributions px x , Px^Xx; an d 

PX 3 \X!- 

Proof. Since / is a positive real function, it may be viewed (up to a scaling factor) as a prob- 
ability distribution of some RVs X±,X2 and X3. Hence, / can be written as f(xi, x 2 , £3) = 
PXi(zi)px 2 \x 1 ( x 2\zi)px 3 \x 1 x 2 ( x 3\ x 'Li x '2)- Since / is a split function via x\, as we will see later (cf. 
Lemma H|) , we have X 2 i A3IX1, and the claim follows. □ 
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X2 X n X2 X n X2 X„ X 2 X„ X 2 X„ 



(a) (b) (c) (d) (e) 

Figure 3: A graphical illustration of: (a) split function (b) conditional function (c) 5=, (d) fe, and 

Now compare a split function f(x±, X2, a%) with a conditional function g(xi,X2,x^) where let 
us assume that the respective scaling constants making the functions into distributions are both 
1. If we are to draw the Bayesian networks (BN) [lj corresponding to the two distributions / 
and g respectively, we shall see that the directions of the edges in the BN of / are completely 
opposite to those in the BN of g. Describing it in terms of causality, one may say: The distribution 
/ prescribes that conditioned on the RV Xi, we generate the RVs X2 and A3 independently, 
whereas the distribution g prescribes that X2 and A3 generates X\ jointly. This sense of "duality" 
or "reciprocity" (evidently existing in the two kinds of functions involving arbitrary number of 
variables) also justifies our notations of opposite edge directions in denoting the two kinds of 
functions. 




(a) (b) 
Figure 4: The BNs corresponding to: (a) a split function, and (b) a conditional function. 

Indicator functions and the Iverson's convention An indicator function is a {0, l}-valued 
function, and at many places we will use the Iverson's convention [P] to denote the indicator 
function on the true/false proposition P defined as [P] := 1 if and only if P is true. For any 

x, y G X, we will often use the proposition x = y, which is defined to be "true" if x = y and 

? 

to be "false" otherwise. Subsequently, we will use the symbol "=" instead of "=", and assume 
the distinction from the usual use of "=" for assignment is clear from the context — Namely, any 

occurrence of "=" inside, and only inside, the Iverson brackets refers to "=". We will mainly be 
interested in the following indicator functions: 

Evaluation indicator Let X be a finite alphabet, the evaluation indicator (evaluating at some 

x £ X) is an indicator function on X defined as 5x(x) := [x = x] for all x E X, 

Equality indicator Let X be a finite alphabet, the equality indicator on n variables is defined 

n 

as 8=(xi, . . . , x n ) := \\ [x\ = X{] for all x\, . . . ,x n G X. Note that an equality indicator is a split 

i=2 

function via any of its arguments, hence, at many places, we may illustrate it graphically without 
a directed edge. 
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Constant-one indicator Let X be a finite alphabet, the constant-one indicator is a degenerate 
indicator function defined as l(x) := 1 for all x 6 X. 

Sum indicator Let ^ be a finite abelian group (additively written), the sum indicator on n 
variables is defined as 8j^{x\, . . . , x n ) := [sci = x 2 + • • • + ^n] for all x\, . . . ,x n E X. A closely related 
indicator function, which is more popular in the factor graphs and normal graphs literature, is the 
parity indicator function, denoted 5+, and defined as 5 + (x±, . . . ,x n ) := [x% + ■ ■ ■ + x n = 0]. It is 
clear that <5s(xi, ... ,x n ) = 5 + (x 1 , -x 2 , ... , -x n ) = 5 + (-x 1 ,x 2 , ■ ■ ■ ,x n ). 

An elementary result concerning sum indicator function is the following lemma. 

Lemma 2. For any functions f on X\ x X 2 and g on X 2 x X3, where X\, X 2 and X3 are abelian 
groups, 

^2f(x,t)g(u,z)5j2{y,t,u) = f(x,y) *g(y,z), 

t,u 

where Sj^ above is defined on X$. 

Proof. Follows directly from the definition of the convolution. □ 

Max indicator Let X be an ordered finite set. The max indicator on n variable is defined as 
6 max (ii, . . . , x n ) := [x\ = max(x2, • • • , x n )\ for all x\, . . . , x n € X. Let I be a finite set and let 
Xi be an ordered finite set for all i G I. The definition of the max indicator is extended to the 

partially-ordered set Xi by defining 6 max (xi, . . . , x'j) := n 5 max (xi, . . . , x^) for all xi, . . . , x'j £ Xi. 

iei 

A graphical illustration of the above local functions is shown in Fig. [3l Note that the max 
and sum indicator functions are both conditional functions as illustrated by the directed edges in 
Figs. [3] (d) and (e). It is worth noting that the bivariate max indicator and bivariate sum indicator 
are both equivalent to the bivariate equality indicator, which is a split and a conditional function. 



3.3 Vertex merging/splitting and holographic transformations 

In the framework of NFGs, a pair of graphical procedures, known as the vertex merging and vertex 
splitting procedures are particularly useful. In a vertex merging procedure, two vertices representing 
functions / and g are replaced with a single vertex representing the function {f,g); conversely, in 
a vertex splitting procedure, a vertex representing a function that can be expressed in the sum-of- 
products form (/, g) is replaced with an NFG realizing the function (f,g). The two procedures are 
illustrated in Fig.El and are closely related to the concept of opening/closing the box [THE]. Note 
that when we put a dashed box around / and g we mean that they are replaced with the single 
function node {f,g), in other words, the last two pictures in Fig. prefer to the same NFG. It is 
easy to see that the exterior function of any NFG is invariant under the vertex merging/splitting 
procedure [9]. 





/ 
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splitting 
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Figure 5: Vertex merging and vertex splitting. 
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The simple pair of procedures enables the notion of "holographic transformation" for NFGs, 
which transforms each local function of an NFG while keeping the graph topology. Further, the 
"generalized Holant theorem" relates the exterior function realized by the holographically trans- 
formed NFG with that realized by the original NFG. We summarize these notions below, and refer 
the interested reader to [9] for more details. 

Suppose that / and g are two bivariate functions on X x X such that (f(x,s),g(s,x r )) = 6=(x,x') 
for all (x,x') £ X x X, then inserting to an NFG edge (regular or half), representing an X- valued 
variable, the pair of functions / and g is equivalent to inserting the function 5=, which can be verified 
not to change the exterior function. The functions / and g in this case are called an inverse-pair 
of transformers, and such a graphical procedure is called inverse-pair transformer insertion. 

Given an NFG Q with a set of half edges L, the following procedure defines a transformed NFG 
with the same topology as Q: 

(HI) In each half edge Xi, insert a bivariate function gi(xi,t/i) — We may refer to such transformers 
as external transformers. 

(H2) In each internal edge, insert an inverse-pair of transformers — We may refer to such trans- 
formers as internal transformers. 

(H3) For each original vertex v in Q, apply vertex merging procedure to merge f v and its surround- 
ing vertices. 

Such a transformation of NFG is known as a holographic transformation. If we denote the resulting 
NFG by Q' , then it is clear that Zgi{yi) = (Zg(xL),Yl ie igi(xi,yi)), since only Step (HI) above 
affects the exterior function. This is essentially the generalized Holant theorem of [9j, which in 
subsequent discussions will be referred to as the GHT. 



4 NFG Models 

We now present a generic NFG probabilistic model. Formally, an NFG probabilistic model or simply 
an NFG model is an NFG whose exterior function is up to scale the joint distribution of some RVs 
(each represented by a half edge) and which satisfies the following two properties: 1) the NFG is 
bipartite and simple; 2) half edges are only incident on one independent vertex set and there is 
exactly one half edge incident on each vertex in this set; we call these vertices interface vertices, 
and call the ones in the other vertex set latent vertices. We will call the corresponding functions 
indexed by these two vertex sets interface functions and latent functions respectively, although we 
will be quite loose in speaking of a vertex and a function exchangeably as we do for a variable and 
an edge/half edge. We will customarily denote the set of interface vertices by I and the set of latent 
vertices by J. \j 

Note that since each interface function has exactly one half edge incident on it, unless it is 
more convenient to make the distinction, we will subsequently identify the set of half edges using 

We note that demanding no half edge incident on the latent functions entails no loss of generality, since if there 
is such a half edge, one may always insert a bivariate equality indicator function (or equivalently a bivariate max 
indicator or sum indicator) into the half edge, which converts the NFG to an equivalent one with this half edge 
turned into a regular edge. Since the bivariate equality indicator is both a split function and a conditional function, 
inserting such a function has no impact on our later restriction on the interface functions, where we require them to 
be all split functions or all conditional functions. 
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(a) (b) (c) 

Figure 6: (a) an NFG model (b) a constrained NFG model (c) a generative NFG model 

/, i.e., an interface vertex will index both its function and the half edge incident on it. An example 
NFG model is shown in Figure [6] (a), where the top-layer vertices are the interface vertices, and 
the bottom-layer vertices are the latent vertices. If necessary, we may formally denote such NFG 
using such a notation as Q(I U J, E, fmj). 

In this modelling framework, we will focus on two "dual" families of models, which we call 
the constrained NFG models and the generative NFG models respectively (see, e.g., Figure [6] (b) 
and (c) respectively for a quick preview). We will demonstrate how these models are related to 
the previous models such as FG and CFG. We will also introduce "transformed NFG models" in 
Section [U a special case of which reduces to CDN. 

4.1 Constrained NFG model 

A constrained NFG model is an NFG model in which all interface functions are split functions via 
their respective external variables. 

To bring more intuition into this definition, we first take a slight digression and show in the 
following lemma that it is possible to "shape" a distribution by "random rejection". 

Lemma 3. Let X be a RV with a probability distribution px, where X assumes its values from a 
finite set X, and let h be a normalized non-negative real function on X with a non-empty support, 
where the normalization is in the sense that maxh(x) = 1. Draw x from px and accept it with 

probability h(x) and reject it with probability 1 — h{x). If x is accepted, output x; otherwise repeat 
the process until some other x' is drawn and accepted. Denote the output random variable by Y . 
Then the probability distribution py ofY is, up to scale, h(y)px(y), for all y £ X. 

Proof. Let Z be a {0, 1} RV representing the random rejection in the lemma, i.e., a sample x £ X 
is rejected if Z = 0, accepted if Z = 1, and the probability that Z = 1 is h(x). Then the statement 
Y = y is equivalent to (X,Z) = (y, 1), and we have py{v) = p(X = y)p(Z = 1\X = y) = 
Px(y)h(y). □ 

The idea of "distribution shaping" via "random rejection" is central to the semantics of con- 
strained NFG models, which we demonstrate in the example next. 

Example 1. Let Q be a constrained NFG as in Fig. [7] where fi(xi, s±, s^) and f2(%2, S2, s' 2 ) are 
positive functions that split via xi and xi, respectively, andh\,hi andhs are non-negative functions 
(with non-empty supports). From Lemma{l\we may express, up to a respective scaling factor, f\ as 
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Px 1 (xi)p Sl \x 1 ( s l\xi)ps' 1 \x 1 ( s i\xi) and h as Px 2 (x2)Ps2\X2(s2\x2)Ps> 2 \x 2 ( s 2\ x 2), for some distribu- 
tions pxd Px 2 , PSx\Xi> Ps'^Xi, Ps 2 \x 2 and Ps' 2 \x 2 - The RVs represented by the NFG may be regarded 
as being generated by the following process. 

1. Draw (xi,X2) from distribution Px 1 {x\)px 2 {x2) where px 1 and px 2 are as specified by our 
choices above. Note that the two components of the drawn vector are independent. 

2. Draw vector (si,s^) from the distribution Ps^x^Ax^Ps'^x^'^xi) and draw (s2,s' 2 ) from 
PS2\x 2 { s 2\X2)Ps 2 \x 2 ( s 2\ x 2)- It * s clear that the joint distribution of (xi, x 2 , si, s[, S2, s' 2 ) is up 
to scale fi(x 1 ,s 1 ,s[)f 2 (x 2 ,s 2 ,s 2 ). 

3. Let H ~(s±, s' l5 S2, s 2 ) ■= c - hi(si)h 2 (s' 1 , £2)^-3 (s'2), where c is a normalizing constant such that 
the maximum value of H(-) is 1. Accept the drawn vector (xi,x 2 , s±, s[, s 2 , s' 2 ) with probability 
H(si, s[, S2, s' 2 ) and reject it with probability 1 — H(si, s' 1} S2, s' 2 ). 

4- If the drawn (x\, x 2 , s±, s[, s 2 , s 2 ) is rejected, repeat the procedure from step 1, until the drawn 
(x±, X2, s\, s'i, S2, s 2 ) is accepted. By Lemma\^ the accepted vector has a distribution equal, 
up to scale, to fx(x\, s x , s' 1 )f 2 (x 2 , s 2 , s' 2 )H(s 1 , s[, s 2 , s' 2 ). 

5. Output (x\, X2). Then clearly the output vector has distribution that is up to scale the exterior 
function of the NFG. 

The procedure introduced in the example above generalizes in an obvious way to arbitrary 
constrained NFG models. Instead of precisely, but repetitively, stating the procedure for the general 
setting, we make the following remarks. The interface functions completely specify how the external 
variables are drawn and how the internal variables are drawn conditioned on the drawn external 
configuration. The drawn internal configuration then undergoes a "random rejection" according to 
the product of all latent functions. The external configuration giving rise to an accepted internal 
configuration then necessarily follows the distribution prescribed by the exterior function of the 
NFG. 



xi 



x 2 



Si , 
17 



s [ S2 

h 2 



^2 
h 3 



Figure 7: Example [U 



Analogously, one may view a constrained NFG model as a "probabilistic checking system": 
independent "inputs" (external variables) excite the "internal states" (internal variables) of the 
system via interface functions; the state configuration is "checked" probabilistically by the latent 
functions; only the external configurations that pass the internal check are kept. In general, the 
internal checking mechanism induces dependence among the external variables, which were a priori 
independent. In the special case when the latent functions are all indicator functions, the checking 
system is in fact deterministic, reducing to a set of constraints on the internal states, cf. Section [6j 
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This has been the motivation behind the name "constrained NFG model". As we will show mo- 
mentarily that constrained NFG models and FG models are equivalent, the "probabilistic checking 
system" perspective of constrained models provides a different and new interpretation of the FG 
models. 

4.2 Constrained NFG models are equivalent to FGs 

Suppose that a constrained NFG model is such that every interface function is an equality indicator 
function. It is known [TT) that one may convert such NFG to an FG according to the following pro- 
cedure: For each interface vertex, replace it by a variable vertex representing its half-edge variable 
and remove the half edge. 

Proposition 1. If in a constrained NFG model all interface functions are equality indicators, then 
the above procedure gives rise to an FG equivalent to the NFG. 

Proof. Let Q(IL)J, E, fiuj) be the NFG in hand where fi = 5= for all i 6 /. The resulting FG has an 
underlying graph (IU J, E) where / and J are the variable and function indexing sets, respectively. 
Hence, the global function of the FG is the multiplication Iljgj fj( x nc(j))- On the other hand, if 
we use T(v) to denote the set of internal edges incident on node v in the NFG, then the exterior 

function of the NFG is ^IX/eJ /j( s T(j))i iLe/ ^=( x «i s T(i))^)i which, if i! and f are connected by an 
edge t, accounts to substituting Xj/ in place of the argument st of ff in the product IX/gj fj( s T(j))i 
for all adjacent i! and f. The claim follows by noting that T(j) = {{i,j} : i £ ne(j)}. □ 

The proposition essentially suggests that the joint distribution represented by such a constrained 
NFG model factors multiplicatively and therefore can be represented by an FG. In fact the converse 
is also true, namely that any FG can be converted to an equivalent constrained NFG model with 
all interface functions being equality indicators. This is illustrated in Figure [8) 




(a) (b) 
Figure 8: An FG and its equivalent constrained NFG. 



Next we show that any constrained NFG is in fact equivalent to one with equality interface 
function. Given an arbitrary constrained NFG Q where each interface function fi splits as riteT(i) /* 
for some bivariate functions ft- The following procedure converts Q into a constrained NFG with 
the same underlying graph as Q, and in which all interface functions are equality indicators: 

1) Replace each interface function with an equality indicator. 

2) Replace each hidden function fj with (/j, /j:t£ T(j)}. 
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Proposition 2. In the above procedure, the original and resulting constrained NFGs are equivalent, 
and have the same underlying graph. 

Proof. The fact that the two NFGs have the same underlying graph is clear. To prove equiva- 
lence, each interface function fi is the product n*eT(i) ft( x u s t) of bivariate functions ft, and from 
Proposition [TJ it can be written as the sum-of-products form ( n*eT(i) ft( s ti st),S=(xi, sLJ). Upon 
vertex splitting, each interface vertex can be replaced with the NFG representing its corresponding 
sum-of-product form. The claim follows upon vertex merging each hidden node fj with its adjacent 
bivariate functions, i.e., by replacing fj with (fj,ft '■ t £ T(j)}. □ 




Figure 9: Converting a constrained NFG model to one in which all interface functions are equality 
indicators, (a) An example of a constrained NFG where by assumption g splits into gi,g% and 53, 
and h splits into h\ and hi, (b) vertex splitting of interface nodes, followed by vertex merging of 
each hidden function with its neighboring bivariate functions. 

The conversion stated in the proposition and an illustration of the proof are shown in Figure [9j 
Invoking Proposition the following theorem is immediate. 

Theorem 1. Every constrained NFG can be converted to an equivalent FG. 



4.3 Generative NFG model 

A generative NFG model is an NFG model in which every interface function is a conditional function 
of its half edge variable given its remaining arguments. The following example gives sufficient insight 
of the modelling semantics of a generative NFG. 



X2 



PX^Si Px 2 \s 2 s 3 




Figure 10: Example EJ 
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Example 2. Let (Si, S 2 ) be jointly distributed according to ps 1 s 2 an d S3 be distributed according to 
ps 3 , where (81,82) is independent of S3. Suppose that (Xi,X 2 ) depends on (S\, S2, S3) according 
to conditional distribution p Xl x 2 \s 1 s 2 s 3 ( x i, x 2\si, s 2 , s 3 ) := p Xl \Si ( x l \ s i)Px 2 \s 2 s 3 (^2^2, S3), it is 
then easy to verify that the joint distribution px x x 2 (xi, x 2 ) is given by the sum- of -products form 
(ps 1 s 2 iPs 3 iPx 1 \s 1 iPx 2 \s 2 s 3 )i an d hence, the RVs (Xi,X 2 ) are represented by the generative NFG 
in Fig. \M 

The NFG in this example is a generative NFG model where Px\\Si an d Px 2 \s 2 s 3 are interface 
functions and ps 1 s 2 and ps 3 are latent functions. In this case, the latent functions serve as inde- 
pendent sources of randomness, which "generate" the internal RVs (Si, 82 and S3). The internal 
RVs then "generate" the external RVs via the interface functions. 

In an arbitrary NFG model, since every latent function may be viewed as the joint distribution 
of its involved internal RVs, subject to a scaling factor, they can be regarded as independent 
"generating sources"; since each interface function is a conditional function, or, up to a scale, a 
conditional distribution of the external RV given its internal RVs, the product of these conditional 
functions may be regarded as the conditional distribution of all external RVs conditioned on the 
internal RVs. The product of all local functions is then up to scale the joint distribution of all 
external and internal RVs. The semantics of NFG then dictates that the joint distribution shall be 
summed over all internal variables, and the resulting exterior function is therefore the distribution 
of the external RVs, up to scale. In a sense, a generative NFG model describes how the external 
random variables are generated from some independent hidden sources. 

4.4 A subclass of generative NFG models is equivalent to CFGs 

In this section, we rely on the Fourier transform in some of the discussions. Let A" be a finite abelian 
group, we use X A to denote the character group (written additively) of X ' , defined as the set of 
homomorphisms from X to C. It is well known that (X A ) A is isomorphic to X, and for any x G X 
and x £ X A , x(x) = x(x). We use kx(x,x) to denote both x(x) and x(x), and use kx(x, x) to 
denote k(x, —x)/\X\. For any function on X, we define its Fourier transform as the sum-of-product 
form f(x) = (Kx(x,x),f(x)\, for all x € X A . It is not hard to show that k and k are an inverse- 
pair, and hence given /, one may recover / using the Fourier inverse as f(x) = (f(x),k(x,x)}, for 
all x € X . It is well known that if X is the direct product Yliei %i °f the finite abelian groups Xi, 
then (X) A is the direct product Y\i e iX A , an d it follows that nx(x,x) = IlieJ K Xi( x ii x i)> f° r an 
(x,x) G X x X A , and similarly for kx- 

Suppose that a generative NFG model is such that every interface function is a sum indicator 
function. We may convert such an NFG to a CFG according to the following procedure: For each 
interface vertex, replace it by a variable vertex representing its half-edge variable and remove the 
half edge. 

Proposition 3. If in a generative NFG model all interface functions are sum indicators, then the 
above procedure gives rise to a CFG equivalent to the NFG. 

Proof. The proof follows the following steps: 1) Modify the NFG by replacing each interface function 
with a parity check indicator and inserting a degree two parity check indicator (a sign inverter) on 
each half edge, Fig. QT] (b). (This does not alter the exterior function due to the relation between the 
sum and the parity indicator function.) 2) Perform a holographic transformation on the resulting 
NFG by inserting the inverse-pair Kx e and kx e into each regular edge e (with Kx e adjacent to a 
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hidden function or an inserted sign inverter), and inserting the transformers Kx e into each dangling 
edge e, Fig. (c). 3) By noting that (up to a scaling factoJl) the Fourier and Fourier inverse of 5 + are 
6=, we obtain (after deleting all degree- two equality indicators resulting from the sign inverters) a 
constrained NFG in which each interface function is an equality indicator and each hidden function 
is the Fourier transform of the corresponding hidden function in the original NFG, Fig. (d). Hence, 
from the GHT and Proposition [H we have Zg(xj) = Yl fj( x ne(j))i an d the claim follows from the 

multivariate multiplication-convolution duality theorem under the Fourier transform [6]. □ 




(a) (b) (c) (d) 

Figure 11: Proof of Proposition [3) (a) An example NFG, (b) Step 1, (c) Step 2, and (d) Step 3. 

An example illustrating the steps of the proof is shown in Fig. [TTJ Of course one may attempt 
to prove the claim by direct evaluation of the exterior function as demonstrated in the following 
example. 

Example 3. Consider the NFG in Figure [771 where the interface functions are taken as sum indi- 
cators. Then the exterior function of this NFG is Y PSiS2(si, S2)ps 3 (s3)6^(xi, si)8y>(x2, S2, S3) 

Sl,S2,S3 

— Y Ps^iixii s 2)PS3(.S3)5j2(x 2 , s 2 , s 3 ) = p Sl s 2 (.x 1 ,x 2 ) *ps3(.x 2 ), where (a) identifies the bivariate 

S2,S3 

sum indicator with equality indicator and removes it, and (b ) is due to Lemma The reader is 
invited to examine the structure of the original NFG and that of the CFG representing the above 
convolutional factorization. 

Indeed, for any generative NFG model in which interface functions are all sum indicators, the 
procedure above Proposition [3] applied to an interface function is equivalent to either applying step 
(a) above (for degree-2 vertices) or applying Lemma [2] (for vertices of degree higher than 2). 

It is easy to see that the this procedure is reversible, in the sense that one may apply it in reverse 
direction and convert any CFG to a generative NFG model with all interface functions being sum 
indicators. Figure [12] shows an equivalent pair of CFG and generative NFG model. 

4.5 Independence 

We now show that there exists a "duality" between a constrained NFG model and a generative 
NFG model in their implied independence properties. 

2 It is not hard to show that all the scaling factors cancel out, and hence, all subsequent equalities are exact. 
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(a) (b) 
Figure 12: An equivalent pair of CFG (left) and generative NFG model (right). 




(a) (b) 
Figure 13: (a) X X Z\Y and (b) X JLZ. 

Lemma 4. For the NFG models in Fia MtA we have X A. Z\Y in the constrained model and X A. Z 
in the generative model. 

Proof. For the constrained NFG, it is sufficient to show that p(x, y, z) is a split function via y, see 
e.g. [15J. To this end, we have gi is a split function via y, say it splits into bivariate functions 52,1 and 
52,2- Hence, applying the vertex splitting procedure for g<i followed by the vertex merging of each 
hidden function and its adjacent bivariate function, it becomes clear that p(x, y, z) = f[(x, J/)/^^/, z) 
where f[ = 51, #2,1) and f 2 = (f 2 , 93, 92,2) ■ 

For the generative model, we prove the claim graphically in Fig. Q3J Marginalizing y, the 
probability distribution p(x,z) is realized by the NFG in Fig. [T5] (a), which, by the definition of 
a conditional function, is equivalent to the one in (b). Marginalizing again, we obtain the NFGs 
in (c) and (d) for the marginals p(x) and p(z), respectively. Hence, the multiplication p(x)p(z) is 
realized by the NFG in (e), which is equivalent (again by the definition of a conditional function) 
to the one in (f). Noting that the NFG in the left side of (f) realizes the scalar 1, and comparing 
with (a), we see that p(x, z) and p(x)p(z) are realized by the same NFG, and hence, must be equal. 

□ 

We remark that the conditional independence part of the lemma can be proved graphically 
in a similar manner to the marginal independence part where marginalization is replaced with 
evaluation. Further, it is interesting to observe that the two NFG models have identical graphs, 
but the constrained model implies conditional independence property whereas the generative model 
implies the "dual" marginal independence property. 

In an NFG model, suppose that A, B and S are three disjoint subsets of vertices. We say A and 
B are separated by S if every path from a vertex in A to a vertex in B go through some vertex in 
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Figure 14: Proof of Lemma [J] for the generative model. 



5. In this case, if A, B and S are all subsets of the interface vertex set /, then, recalling that every 
external variable is also indexed by the interface vertex it is incident with, we also say that RV sets 
X_4 and Xq are separated by the RV set X$- For any subset /' C /, let ne(I') := {ne(i) : i £ I'}. By 
merging the vertices in A into one vertex, and similarly for the interface nodes S and B' := I\(„4.U<S), 
and performing the same merging for the hidden nodes ne(^4) and J\ne(^4). Then the resulting 
NFG has the same graph topology as the ones in Fig. [T3l as it is clear from the separation property 
that ne(i3') C J\ne(*4). From the fact that the split and conditional properties are preserved under 
such mergings, the previous lemma extends in a straightforward manner to any NFG model, and 
we have the following theorem. 

Theorem 2. Let Q(I U J, E, fujj) be an NFG model and A, B and S be three disjoint interface 
vertex subsets, i.e., subsets of I. Suppose that X4 and Xq are separated by X$. Then 

1. If the NFG is a constrained NFG model, then Xjs, A. Xq\Xs- 

2. If the NFG is a generative NFG model, then X4 X X$. 

Part 1 of Theorem [2] is essentially the global Markov property (see, e.g., |15j ) on an FG model 
(noting that constrained NFG models are equivalent to FGs). Part 2 of the theorem, in the special 
case when all interface functions are sum indicators, was proved in [5] in the context of CFGs 
(noting that such NFG models reduce to CFGs). That is, Part 2 of Theorem [2] generalizes such 
a result from CFGs to arbitrary generative NFG models. We now provide some insights for this 
result. 

Consider the NFG in Figure [13] (b). The fact X i Z can be reasoned by the fact that latent 
functions f\ and /2, giving rise to X and Z respectively, serve as independent sources of randomness. 
Indeed, it is precisely due to X and Z sharing no common latent functions that when Y is ignored 
X and Z become independent. The same is true for arbitrary generative NFG models, where if 
Xjs, and Xq are separated by X$, then we necessarily have Xjs, and Xq share no common latent 
functions. 
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We remark that the marginal independence, i.e., Part 2 of Theorem [2] holds for the more general 
class of NFG models characterized by the property that for each interface function /j, it holds that 



for some univariate functions ft- We may refer to an NFG model whose interface functions satisfy 
this property as an extended generative model, and it is clear that a generative model is an extended 
generative model. (A conditional function trivially satisfies the property above.) It is not hard to 
show that the class of constrained models and the class of extended generative models are closed 
under internal holographic transformations, from which, it follows that the independence properties 
in Theorem [2] are invariant under internal holographic transformations. 



In some applications, instead of modelling the joint probability distribution of the RVs, we may 
wish our model to represent a certain transformation of the joint distribution, and it becomes 
clear that the NFG modelling framework introduced in this paper is particularly convenient for 
this purpose. Moreover, this framework provides a generic transformation technique and enables 
an infinite family of such transformations. In subsequent discussions, a transformed NFG model, 
or simply a transformed model, refers to any NFG obtained from an NFG model (generative or 
constrained) by a holographic transformation, where the external transformers in Step (HI) of the 
holographic transformation are not necessarily trivial — At some places we may refer to the original 
NFG as the base model. 

Next we show that a particular class of NFG models, upon an appropriate choice of holographic 
transformation, results in CDNs. Let X := {1, . . . , \X\} and let bivariate function Ax on X x X 
be such that Ax{x, x') = 1 if x' < x and Ax{x, x') = 0, otherwise. We call Ax a cumulus function. 
Let bivariate function Dx on X x X be such that Dx{x,x') = 1 if x = x', Dx(x,x') = — 1 if 
x = x' + 1, and Dx(x,x') = otherwise. We call Dx a difference function. 

In the case where X is the Cartesian product Yl i€l Xi where Xi := {l,--- ,1^1} and / is a 
finite indexing set, then the previous definitions are extended to the partially-ordered set X in a 
component-wise manner by setting ,4^ {xj, x'j) := Y\ ieI A Xl {xi, £■) and D x {xi, x\) := f\ ieI D Xi (xi, x[) 
for all xi,x'j € X. In our notations for cumulus and difference function vertices, we distinguish the 
first argument using a dot to mark the corresponding edge, cf. Fig. [15] (a). 

The following lemma will be useful in characterizing CDNs as a subclass of transformed NFG 
models. 

Lemma 5. Let X = Y\ Xi where Xi :={!,■■ ■ , \Xi\} for all i in the finite indexing set I. Then, 



1. (Ax{x,y),Dx{y,x')) = 5 = {x,x'). 

2. For any set of RVs Xj, (pxj(x'), Ax(x, x')) is Fxj(x) for all x G X . 

3. The two NFGs in Figure [23 are equivalent where each edge variable assumes its values from X . 

Proof. Parts 1 and 2 are immediate from the definitions of Ax and Dx- For part 3, let Q be as in 
Fig. [E] (a). First we prove the result for n = 3 and |/| = 1, i.e., X = {!,••• , \X\}, where by the 




5 Transformed NFG Models 
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definitions of the difference transform, the exterior function, and the max indicator function, we 
have 



Zg(xi,x 2 ,x 3 ) 



A x (xi,max(x 2 ,x 3 )) - Ax(xi,max(x 2 + l,x 3 )) - 

Ax{xi,max(x 2 ,x 3 + 1)) + Ax(x\,max(x 2 + l,x 3 + 1)) ,x 2 ,x 3 < \X\ 

Ax(x±,max(x 2 ,x 3 )) - Ax {x\ , max(x 2 + l,x 3 )) ,x 2 < \X\,x 3 = \X\ 

Ax(x±,max(x 2 ,x 3 )) - Ax(xi,m&x(x 2 ,x 3 + 1)) ,x 2 = \X\,x 3 < \X\ 

Ax(xi,max(x 2 ,x 3 )) ,x 2 ,x 3 = \X\ 



From the definition of the cumulus, it is clear that any possible non-zero values of Zg may occur 
only if xi > max(x2,X3). Assume x% > max(x2,X3) and note that in this case it is impossible to 
simultaneously have x 2 = \X\ and X3 = \X\, then 



Z g (x 1 ,x 2 ,x 3 ) 



1-1 + 1 ,x 2 ,x 3 < \X\ 

1 ,x 2 < \X\,x 3 = \X\ 

1 , x 2 = \X\,x 3 < \X\ 



Hence, assume x\ = max(x2,X3) and further suppose x 2 < x 3 , then 

ry ( \ \ \-\ ,x 2 ,x 3 <\X\ 

Zg{x^x 3 ) =| 1 _ 1 ;X2< |^ [)X3 = |^ | 

= 0. 

By symmetry, we also have Zg{x\,x 2 ,x 3 ) = for x 2 > x 3 . The only possibility left is x 2 = x 3 , in 
which case, it is clear that Zg(xi,x 2 ,x 3 ) = 1. For |/| > 1, the claim follows from the fact that the 
cumulus, difference, max and equality functions are defined in a component- wise manner. 

For general n, using the fact that max(x 2 ,x 3 , . . . , x n ) = max(x2, max(x3, . . . max(x n — 1, x n ) ...)), 
we can express the NFG in Fig. [T3] (a) using the NFG in Fig. [TU] (a). Inserting the inverse-pair Ax 
and Dx on each edge inside the dashed box in Fig. [16] (a), we obtain the equivalent NFG in (b). 
Invoking the established part of the lemma for n = 3 under the vertex merging shown in (b), and 
the claim follows. □ 
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(a) (b) 
Figure 16: Proof of Part 3 of Lemma [5] for arbitrary n. 



In this lemma, Part 1 suggests that Ax and Dx are an inverse-pair transformers, and Part 2 
suggests that cumulus functions may serve to transform a probability distribution to a CDF. 

Given a generative NFG in which each interface function is a max indicator and each hidden 
function is a probability distribution of the variables incident on it. Let Q be the transformed model 
obtained from such generative NFG by inserting the cumulus transformer Ax t on each half-edge. 
The following procedure converts Q into a CDN: 

1) Replace each max indicator and its adjacent transformer with a variable node representing 
the transformer's half-edge variable, and delete the half edge. 

2) Replace each hidden function, which is a probability distribution, with the corresponding 
cumulative distribution. 

Theorem 3. // in a transformed model all external transformers are cumulus transformers, all 
interface functions are max indicators, and all hidden functions are probability distributions, then 
the above procedure gives rise to a CDN equivalent to the transformed model. 

Proof. Perform a holographic transformation on the transformed model by inserting into each 
internal edge e the inverse-pair transformers Ax e and Dx e , with the cumulus facing the hidden 
function and the difference transformer facing the max indicator. Merging each hidden node with 
its neighboring cumulus transformers, by Part (2) of Lemma O the resulting node represents the 
desired CDF. Merging each max indicator with its neighboring difference transformers and the 
already existing cumulus, by Part (3) of Lemma [5j we arrive to a constrained NFG in which each 
interface function is an equality indicator, and the claim follows by Proposition [TJ □ 

Figure [TTl demonstrates the proof on an example NFG. 

Before we proceed, we remark that the independence properties implied by a transformed model 
are precisely the ones implied by its base NFG, i.e., by ignoring all the external transformers. By the 
remark succeeding Theorem [2j such independence properties are further invariant under internal 
holographic transformations. Since the base NFG of the transformed model in Theorem [3] is a 
generative model, it becomes clear that the independence properties implied by CDNs are the 
marginal independence ones, i.e., Part 2 of Theorem [2j 

Besides CDNs, we also remark that it is possible to regard linear characteristic models (LCMs) 
[7] as a special case of transformed NFG models. In fact, the NFG in Fig. QT] (d) is a transformed 
model that is equivalent to an LCM, and the holographic transformation presented in the proof of 
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(a) (b) (c) (d) 



Figure 17: (a) An example of a transformed model in accordance to Theorem [3J (b) Inserting the 
inverse-pair cumulus-difference transformers on each internal edge, and vertex merging each node 
with its neighboring transformers, by the GHT, the resulting NFG is equivalent to the one in (a), 
(c) By Lemma [5] an equivalent NFG to the one in (b), where Fi is the cumulus transform of fi, 
which is a CDF if fi is a probability distribution, (d) By Proposition [H an equivalent CDN to the 
NFG in (c). 

Proposition [3j Fig. [11] (c), provides the basis for understanding an LCM as a transformed model 
whose base NFG is a generative NFG equivalent to a CFG, Fig. [TT] (a). We skip details, and hope 
the framework of transformed NFG models is clear enough to see such equivalence. 

6 Linear codes 

In this section we discuss the connection between NFG models and linear codes. The material in 
this section is well-known [16], yet, we choose to address it in this work in light of the constrained 
and generative semantics. 

Any code C, linear or not linear, can be described in terms of its membership function, namely, 
the indicator function 5c{y) '■= [y £ C] for all y 6 X and for some finite set X. Clearly 5c may be 
viewed, up to scaling factor, as a distribution function over X and in the case of linear codes, as 
we will see, may be described in terms of a constrained or a generative NFG model. 

A linear code of length n and dimension k over a finite field F is a k dimensional subvector space 
of F n . Classically, a linear code C can be expressed in two dual ways. At one hand, C = {f{x) : 
x G F fc } for some linear function / : ¥ k — > F n . This can also be written as C = . . . , f n {xj) : 

x e ¥ k } for some linear maps fi : ¥ k — > F for all i. That is, the code indicator function 5q (y) for 
all y £ F n , can be expressed as the sum-of-products form 

n 

Sc(yi, ■ ■ ■ , yn) = ^2 Y\[yi = fi{x)], 

x& k »=1 

for all yi € F. Clearly, each local function \yi = fi(x)] is a conditional function of yi given x, and 
so, the indicator function of C can be realized using a generative NFG model. In fact, since each 
linear function fi : ¥ k — > ¥ can be written as fi(x±, . . . , x^) = anx\ + . . . + a^x^ for some ajj € F, 
it follows that each interface function is a sum indicator function involving yi and Xj for all j such 
that a>ij 0. The role of the hidden functions is to provide replicas of each variable Xj for all 
j £ {1, • • • , k}, and hence are taken as equality indicators. More explicitly, for each Xj, we have 
a hidden equality indicator of degree equals the number of interface functions involving Xj. This 
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guarantees that each variable appears in the desired number of interface functions while respecting 
the degree restrictions [TT]. From this we arrive to a generative NFG with n interface nodes (each 
is a sum indicator), k hidden nodes (each is an equality indicator), and there is an edge connecting 
nodes i and j if and only if aij is nonzero. Fig. [18] (a) illustrates the generative NFG model for the 
Hamming code (with n = 7 and k = 4). 

On the other hand, the parity check interpretation of a linear code dictates that the elements 
of a linear code C must satisfy a collection of homogeneous linear equations. That is, C = {y €E 
IF n : fin) = 0} f° r some linear map / : F n — > Y n ~ k . This can also be written as C = {y £ F n : 
(/i(y)> ■ ■ ■ ) fn-kiy)) = 0} f° r some linear maps fj : F n — > F. Hence, the code indicator function can 
be expressed as the product 

Sc(y) = U[f j (y) = o], 

j 

which (since fj is linear) can further be simplified as 5c(yi, ■ ■ ■ ,y n ) = rijEi a ijVi = 0] f° r some 
aij E F. From Proposition[TJ we can see that the code indicator function is realized by a constrained 
NFG model in which each interface node i is an equality indicator, each hidden node j is a parity 
indicator, and there is an edge connecting nodes i and j if and only if is nonzero, Fig. [19] (a). 
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Figure 18: (a) Generator realization of Hamming code, and (b) parity realization of dual Hamming 
code. 
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Figure 19: (a) Parity realization of Hamming code, and (b) generator realization of dual Hamming 
code. 



In summary, a constrained NFG model of a linear code represents a parity realization, and 
a generative NFG model represents a generator realization. From the duality between the sum 
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indicator and the equality indicator under the Fourier transform, the duality between a generator 
realization of a code and a parity realization of the dual code may be explained as follows: Starting 
with a generative NFG model (generator realization) of a linear code, Fig. [18] (a), and performing 
a holographic transformation with Fourier transformers, one obtains a constrained NFG model (a 
parity realization) of the dual code, Fig. [18] (b). Conversely, starting with a constrained NFG model 
(a parity realization) of the code, Fig. [19] (a), one ends with a generative NFG model (a generator 
realization) of the dual code, Fig. [19] (b). 



7 Evaluation of the exterior function 

In this section we discuss the algorithmic aspect of evaluating the exterior function of NFGs. We 
start with the "elimination algorithm" for NFGs, which is essentially the well-known elimination 
algorithm of inference on undirected graphical models [T7] . The elimination algorithm is exact but 
its complexity depends on the ordering at which the elimination is performedH A more efficient 
algorithm, but exact only on NFGs with no cycles, is the sum-product algorithm [3], which we 
discuss in the language of NFGs [18] in the second part of this section. Finally, we discuss an 
"indirect approach" for evaluating the exterior function, where a holographic transformation, if it 
exists, is used to convert the NFG into one that is more appropriate for the such computation. 



7.1 Elimination Algorithm 

The following algorithm computes the exterior function of an arbitrary NFG (i.e. not necessarily 
a bipartite). 

Algorithm 1 (Elimination). Given an NFG Q. 
While Q is not a single node. Do 

{ 

Pick an adjacent pair of vertices v\ and vi in (/; 

Compute fviv 2 (. x E(vi)UE{v2)\E(vi)nE{v2)) := (/uu/va); 
Update Q by removing f Vl and f V2 , and adding f VlV2 ; 

} 

Evidently, this algorithm runs in a finite time and terminates with a single node whose function 
is the desired exterior function. This is essentially the vertex merging procedure applied recursively 
on pairs of adjacent vertices. Clearly, the elimination algorithm may equivalently be viewed as an 
elimination algorithm on the edges of the NFG, where in each step it eliminates all the edges 
between a pair of adjacent vertices. More precisely, one may say, the elimination algorithm is a 
merging algorithm on the nodes, and is an elimination algorithm on the edges of the NFG. In 
subsequent discussions, we will freely alternate between such two views. Note that even if we 
start with a simple NFG, parallel edges may still aris^j) while applying the elimination algorithm. 
However, loops may never arise since in each step we eliminate all parallel edges at once. 

Example 4. Let Q be as in Fia\20\ (a). Applying the elimination algorithm, we obtain: 



3 The problem of finding the "best" elimination ordering is known to be NP-hard, where the term "best" is in the 
sense of minimizing the largest node-degree (of nodes that do not factor multiplicatively) arising while performing 
the elimination algorithm, and the minimization is over all possible orderings. 

4 It is not hard to see that parallel edges do not appear at any step of the elimination algorithm if and only if the 
NFG is cycle-free, i.e., is a tree. 
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(a) (b) (c) 

Figure 20: Elimination algorithm example. 

Eliminating y\ gives the NFG in Fig. (b) with faixx, x%, 2/2, 2/3) = (fi(xi, yx, y 3 ), /2(x2, yx, 2/2)), 
eliminating y 2 , y 3 gives the NFG in Fig. (c) with fx^xi, x 2 , x 3 ) = (fvz{xx, x 2 , 2/2, 2/3), /3(^3, 2/2, 2/3)), 
and is easy to verify that Zg = fi 2 3- 

For any node t> in an NFG, let deg(t>) := l-E^v)] be the number of external and internal edges 
incident on v. The following lemma determines the complexity of eliminating a pair of adjacent 
vertices in an NFG. 

Lemma 6. The complexity of eliminating a pair of adjacent vertices u, v in the elimination algo- 
rithm is of order \x\<^s(n)+deg{v)-\E(u)nE(v)\ _ 

Proof. For each x G ^E(u)uE(v)\E(u)nE(v)^ we need \^E(u)nE(v) \ computations, and the claim follows 
by noting that \E(u) U E(v)\E(u) n E(v)\ = \E(u)\ + \E(v)\ - 2\E(u) n E(v)\. □ 

The following example shows that for some indicator functions of interest, the elimination 
complexity can significantly be reduced. 

Example 5. Consider the NFG in Fia. \21\ (a), where each edge is associated the same alphabet X. 
In general, the complexity of computing the exterior function is of order \X\ n+l . In the special case 
where f is the indicator function Sy* ° r ^max; then the complexity is (n — l)|^f| 2 . This is because 
such indicator functions may further be factorized as shown in Figs. {21\ (b) and (c). To see this, 
the elimination algorithm may start by eliminating t\, . . . ,t n , which induces no complexity as each 
elimination simply accounts to pointing to the proper entry of each function fi, resulting in Fig. (d), 
where fl is properly defined according to <5v or <5 max . Now eliminating each ej (starting with e n -\) 
costs computations, giving rise to (n — \)\X\ 2 computations in total. Note that if f = S=, then 
the complexity of computing the exterior function is (n — 1)\X\. (For each x G X , we need (n — 1) 
multiplications.) 

It is not hard to see that one may impose an elimination ordering such that the elimination 
algorithm may be viewed as one that merges a node with its neighbors, picks another node and 
merges it with its neighbors, and so on, until there is only one node left. We refer to such elimination 
as "block elimination," and to the elimination algorithm at the block level as the "block elimination 
algorithm," i.e., in each step, the block elimination algorithm replaces a function fi and its neighbors 
with the function (/j, fj : j G ne(i)). In the special case where each block eliminated function /, is 
the equality indicator and none of the neighbors of fi share any edges, cf. Fig. [22] and Example [6j 
then the block elimination becomes the classical elimination algorithm of undirected graphical 
models. (An undirected graphical model can be converted to a FG which by Proposition Q] is 
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Figure 21: Example [5j The complexity of eliminating a sum or max indicator function. 



equivalent to a constrained NFG whose interface functions are equality indicators — Note that 
none of the neighbors of such equality nodes shares any edges due to the bipartite nature of an 
NFG model.) We remark that whereas the elimination algorithm does not preserve the bipartite 
structure of an NFG in each edge elimination step, on the block elimination level, the algorithm 
respects such bipartite structure. (Let I and J be the two independent vertex sets, and let / ne (j) be 
the node resulting from merging node i € I and its neighbors. Let node j £ J be connected to / ne (j) 
by some edge e, then e G E(i) or e € E(j') for some f G ne(i). Both such cases are impossible, 
since e G E(i) implies j G ne(i) and so j would have been merged with i in the block elimination, 
and e G E(j') violates the bipartite assumption.) 

The following example addresses the complexity of a block elimination. 

Example 6. Let Q be as in Fig. \23\ where Ei,..., E n are disjoint. This NFG may be understood 
as a sub-NFG, the computation of its exterior function corresponds to a block elimination of a node 
and its neighbors in a bigger NFG. Note that if the bigger NFG is a bipartite, then the requirement 
that E\,. . . ,E n be disjoint is automatically satisfied. Assuming the variable of each edge incident 
on f takes its values from X, then by recursive application of Lemma® the complexity of computing 
the exterior function Zq{xe X i ■ ■ ■ i x E n ) is given by (assuming the elimination order x%,X2,-- - ,x n ) 

n 

\x\ i+n - i+ ^=^ Ejl . 

8=1 

That is, if Ei is non-empty for all i, then the complexity is of order \X\ 1+ \ El ^ H^l. ^ an 
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Figure 22: Block elimination algorithm: (a) an example NFG Q, (b), (c), (d), and (e) illustrate 
the block elimination of nodes q±, qi, q^, and 54, respectively, and it is not hard to see that 
Zg(x/±) = f^ixi) = /3OE4, X4). Note that if Q is obtained from a constrained NFG by gluing 
functions g±, gi, and 53 into the external edges x\, X2, and x$ of the constrained NFG, then in 
the special case where every gi is the constant-one indicator, it follows that Zg is the marginal 
probability distribution px A , cf. Section 



example, consider for instance n = 3, then the exterior function of Q is given by 

SP3 

, " V 

Zg(xE 1 ,XE 2 ,XE 3 ) = ^/3, (h, (fl,f) } ) • 

SP1 



Computing the first sum of products (SP1), i.e., eliminating x\, involves \X\\ El \ +3 computations by 
Lemma\^ and similarly, one needs |I £ ' 1 I+I £ ' 2 I+ 2 operations to compute SP2. Finally, SP 3 requires 
|;f j|£i|+|£ 2 |+|£ 3 |+i p era tions. 

Next we consider the transformation of a function. Suppose we are interested in the exterior 
function of the NFG in Fig. [24] (a), where x and x' take their values from a finite alphabet X n 
for some positive integer n. Clearly the exterior function represents a matrix-vector multiplication, 
and hence may be computed in \X^ n operations. In the case where the bivariate function (on 
X n x X n ) f factors as the product of bivariate functions (on X x X) /(,-•• , f' n , then from the 
previous example, it follows that the complexity of transforming the function / via transformers 
/{,••• , f' n is n|A^| n+1 . (This is the special case with \E\\ = ■ ■ ■ = \E n \ = 1, compare Figs [23] and 
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Figure 23: Example EJ 

[2H (b).) Another way of viewing this is that the elimination of each Xi accounts to a matrix- vector 
multiplication for a given configuration aJjv\{i}) where N := {1, ■ ■ • , n}, and hence requires \X\ 2 
computations. That is, eliminating Xi requires |Af|( n_1 ) +2 operations, and eliminating x%, ■ ■ ■ ,x n 
requires in total n|^| n+1 operations, as observed. 



/ 



X' 

(a) 




(b) 



Figure 24: Transformation of a function. 



We emphasize that if the transformers exhibit a special form, then more efficient computations 
may be achieved. For instance, if each f- is a Fourier kernel, then the fast Fourier transform may 
be used in the elimination of each edge using log(|r ; f|)|A'| n computations, giving rise to a total 
log (1^1) 1^1" operations for computing the transformation of /. Another important case where 
such savings are possible is when each /• is the cumulus or the difference transform. In such case, 
the matrix-vector multiplication associated with the elimination of each edge may be performed 
using \X\ operations, giving rise to a total complexity of n\X\ n . To see this, consider our example 
with n = 3 and assume f[, fy, f$ are cumulus transforms. To eliminate x\, we start with the 
initiation s := /, and for x' x = 2, . . . , \X\, we update s as, 

s(x' 1 ,x 2 ,x 3 ) = f(x' 1 ,x 2 ,x 3 ) + s(x[ - l,x 2 ,x 3 ). 

Hence, computing SP1 in operations instead of Similarly, we compute SP2 and SP3, 

giving rise to a total number of operations 3 1 | 3 . This clearly extends to any n, and below we 
provide two algorithms for fast computation of the cumulus and difference transformations, where 
to facilitate notation, we use J~ to denote the set {1, ... ,j — 1} and J + to denote {j + 1, . . . , n} 
for any j € {1, . . . , n}. 



Algorithm 2 (Fast cumulus transform). 
For j = 1,... ,n{ 



Initialize: sq = f ; 
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For each (x > J _,Xj+) E X n ~ x { 

Initialize: Sj(x'j_,x'j = l,Xj+) = Sj-\(x'j_,Xj = l,Xj+); 
For x'j =2, — , 

sj (x'j. , x'j , x J+ ) = Sj-i (x'j. , x'j ,x J+ ) + sj (x'j. , x'j - 1, x J+ ) ; 
} 

} 

} 

Return s n ; 

The difference transformation is performed in exactly the same manner, except for the updating 
rule: 

Algorithm 3 (Fast difference transform). Initialize: sq = f ; 
For j = 1,... ,n{ 

For each (x' J _,x J +) € X n ~ 1 { 

Initialize: Sj(x'j_,x'j = l,Xj+) = Sj-\(x'j_,Xj = l,Xj+); 

For x'j = 2,...,|«Y|{ 

s j (x'j.,x' j ,xj+) = Sj-i(x'j_,x' j ,xj+) - Sj-i(x'j_,x'j - l,x J+ ); 
} 

} 

} 

Return s n ; 

7.2 Sum-product algorithm 

Given an NFG Q with no external edges, in many circumstances, for each edge e in the NFG, one 
may be interested in computing the marginal exterior function [18] defined as 

Zg(x e ) ■= Yl II fvi x E{v))- 

x E\{ e } vev 

It is possible to compute such marginals using the elimination algorithm by imposing an elimination 
ordering such that x e appears last. (More precisely, the elimination algorithm is slightly modified 
here such that when it reaches x e it multiplies the functions of the two nodes incident with e 
without summing out x e .) Although the elimination algorithm is exact for any NFG, it may be 
expensive to perform, as it is likely to produce nodes with large degrees. Further, in addressing 
the marginals problem above, the elimination algorithm is repeated for each marginal, giving rise 
to some redundant computations since most of the computations used for evaluating one of the 
marginals can be used in determining some other marginals. The sum-product algorithm (SPA) 
is an efficient alternative in the case of NFGs with no cycles. We refer the reader to [3] for an 
excellent exposure to the SPA on FGs, and to [1 H [T8] for its formulation on NFGs. 

Let Q be a tree NFG (i.e. an NFG whose underlying graph is a tree) with a vertex set V and an 
edge set E comprised entirely of internal edges, i.e. Q has no external edges. To facilitate notation, 
if nodes u and v are neighbors, we use e uv to denote the edge {it,?;} — Note that e uv = e vu . The 
SPA defines a set of messages that are passed between adjacent nodes, where we use \x u ^, v to denote 
the message passed from node u to node v. The messages are governed by the following update rule 
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(Fig. [25D: 




a:e^B( u )\{ eii „} 



i/£ne(«)\{i>} 



where a node u sends its message to adjacent node v only if it has received the messages of all 
its other neighbors, and the algorithm terminates when every node has sent a message to all its 



indicator function Sy^ or <5 max , it is not hard to see that the complexity becomes (deg(n) — 2)|A"| 2 . 
Further, it is clear that when f u is the equality indicator, the message sent to node v is simply the 
multiplication of the incoming messages from all other neighbors of u, i.e., 



and the complexity of computing such message is (deg(«) — 2)|A"|. 

Note that upon termination, the SPA would have computed 2\E\ messages with two messages 
per edge, and it is possible to show, due to the tree structure, that each marginal Zg{x e ) is given 
as the product of the two messages carried by the edge e. That is, for any e uv G E, we have 



We remark that although the SPA was formulated on NFGs with no external edges, this does 
not present a serious limitation, as in many applications one converts the external edges, if present, 
into regular ones by gluing the constant-one or the evaluation indicators. (For each external edge 
the choice of the indicator function depends on interest and whether such edge is observed as 
"evidence" or not, cf. Section [8l) However, if one insists on NFGs with external edges, then the 
SPA algorithm still works for such NFGs, and it is possible to show that in this case if L is the set of 
external edges, then for any internal edge e uv € E, the product /i tt _ > ^/i 1 ,_ >u is equal to Zg(x euv , xl) 
defined as Zg(x euv ,x L ) ■= Ex«, r , ^ TlveV /»(%(»)). and hence, the exterior function is given 
as Zq(xl) = Ylx e Zg(x euv , xl)- However, in this case, the SPA might be expensive to perform 
since the size of each message increases every time it is passed by a node with a dangling edge, as 
the message accumulates the variable of such edge as an argument. 

Below, we give a simple example illustrating the SPA. 

Example 7 (SPA example). Given the NFG in Fig. we have 




v ' £ne(u)\{v} 



Zg(x euv ) — ^i u ^. v (x euv )fi v ^ > . u (x euv ). 



Mi->2(yi) = A(yi) 



^4^3(2/3) = 74(2/3) 





V2,V3 



^5^3(y4) = /5(2/4) 
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Figure 25: SPA update rule, where the message sent from node u to its neighbor v is computed 
using all the messages arriving to u from all its other neighbors, and its local function f u , namely, 

fJ'U^V — {/ll! \l"v'-+ui • • • j fJ'v"— >u) ■ 
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Figure 26: SPA example. 



anc? if is clear by direct substitution that, for instance, 

Another example is provided below in relation to the difference transform and the "derivative 
sum-product algorithm" of Huang and Frey [19J. 



Example 8. Let Q be as in Fig. \21\ the SPA computes the following messages: 

M/i->«i(i/i) = /i(y'i)' ^92^/3(^2) = ^^(y^Vck^iy^); 

Mgi->/ 2 (y'l ) = (y")Vh^qi (y'l )/ ^92^/2 (j/2) = ^2-^2 (l/sO/Vs-Ha (2/2 )/ 

m/2-><72 (1/2) = E y » /2(y" , y 2 ' Ki->/ 2 (y'l); (y'i ) = E y » .My" , y'Dv^h (1/2); 

/^/3^g 2 (y2) = /3(%); M 9 i-^/i(yi) = Mja-^G/iW-wnCyi); 

Hq 2 ->d 2 {y2) = M/2->q 2 (y2)^/3-+9 2 (y2); fl qi ^ dl {yi) = fif^ qi 

Vd 2 ^u 2 {x2) = J2 V2 D {x2,y2)v q2 ->d 2 {y2); n dl -> ui (x{) = J2 yi , yi)^ qi ^ dl (yi ) • 

By noting that Mdi->gi(yi) = D(x~i,yi) and fid 2 ^q 2 (y2) = D(x 2 ,y2), one can see by direct sub- 
stitution that 

/idi-nn(zi) = ^D(x 1 ,yi)f 1 (y 1 )^ j D(x2,y 2 )f2{yi,y2)h{y l 2)^ 

yi y'l 



29 




Figure 27: 



and 



fi d2 ^ U2 (x 2 ) = ^D(x2,y2)h(y2)^D(xi,y")f 2 {y'l,y2)fx(y'l). 



If we use the notation d x f(x,y) := D(x, z)f(x,y) to denote the difference transform of a func- 
tion f (with respect to x), then the above two messages can equivalently be written as: 

Mdl-Mtl fal) = d Xi [fl{x 1 )d X2 {f 2 (xi,X 2 )f 3 (x2)}\x2=X2], 

which is the difference transform of the product of all hidden functions with respect to x±,X2 eval- 
uated at X2 = x~2, an d similarly 



Vd 2 



,{X2) = d X2 [h{x2)d Xl [f2{xi,X2)fi{xi))\ xl=Xl ] 



is the difference transform evaluated at x\ = x\. 

Let Q be a tree NFG that is also a constrained NFG with a set I of interface nodes consisting of 
equality indicators. Let Q' be the NFG obtained from Q by gluing on each dangling edge a difference 
function, and Q" be the NFG resulting from Q' by gluing an evaluation function on the other end of 
each such difference function, cf. Fig. [271 Performing the SPA on Q" , it is clear that the discussion 
in the previous example extends to any such Q" . That is, the message /^-^(xj) is the difference 
transform of the product of the hidden functions with respect to xj evaluated at where dj is 

the difference node adjacent to interface node i and Ui is the evaluation node adjacent to dj. This 
is the "derivative-sum-product" algorithm of [19] in the case of finite alphabets. 

We close with two remarks: First, we emphasize that in performing the SPA over Q" , one 
may utilize the fast difference algorithm in computing the messages emitted by the difference 
nodes, making the complexity of computing such messages linear in the alphabet size. Second, 
as one may expect, marginalization by summation on Q' is marginalization by evaluation on Q, 
and the difference function guarantees the conversion between such notions of marginalization, as 
demonstrated in the example below. 



Example 8 (Continued). Assume the function of node u\ in the NFG in Fig. [^7] is replaced with 
the constant-one indicator (this accounts to marginalizing x\ by summing it out), and assume all 
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variables take their values from a set X. Then we have //^-^(xi) = 1, and from the definition of 
the difference function, it is clear that /Urf 1 _ > . gi (yi) = <5|;r|(yi). By direct substitution, it follows that 

Vd 2 -*U 2 {X2) = J2D(x 2 ,y 2 )f 1 (\X\)f 2 (\X\,y 2 )f 3 (y 2 ). 

V2 

Or equivalently, 

Li d2 ^ u , 2 {x 2 ) = d X2 lh(\ x \)h{\X\,x 2 )h{xi)\ 

That is, the message Hd 2 -^u 2 ^ s the difference transform with respect to x 2 of the multiplication 
of the hidden functions, with x\ marginalized by evaluation at \X\. One may verify that /i^-^ 
remains unchanged. 

7.3 Indirect evaluation of the exterior function 

Below, we discuss an indirect method for finding the exterior function of a bipartite NFG Q, where 
a holographic transformation, a one that preserves the exterior function, is applied to the NFG 
a priori in hope of facilitating the computation. The idea is to perform a transformation, if it 
exists, that replaces each function fi with an equality indicator, and hence benefit from the low 
computational complexity of such nodes in the elimination or the SPA. Of course one might not be 
able to find a holographic transformation that converts each function /j into an equality indicator, 
in which case, one may settle with one that produces such effect for a subset /' C I. For this 
approach to work, it is necessary that: 1) There exists a set of transformers {g e : e £ E} such 
that (fi,g e '■ e £ ^CO) = f° r an i £ I' for some non-empty I', and 2) There exists efficient 
algorithms for computing the transformations induced by g e . The following example demonstrates 
this approach. 

Example 9. Let X be a finite ordered set and consider the NFG in Fig. [23 (a) where each edge 
variable assumes its values from the partially- ordered set X n for some integer n. Directly computing 
the exterior function requires \X \ 2n operations. However, the exterior function may be computed 
using a number of operations of order n\X\ n by first computing the fast cumulus transforms of 
fi and f 2 , multiply the resulting two functions, and then invert the result using the fast difference 
transform, Fig. (c). The exterior function is invariant under such procedure since it simply accounts 
to the holographic transformation in Fig. (b), which is equivalent to the NFG in (a) by the GHT 
and to the one in (c) by Lemma\^ 

The discussion above parallels the well-known fast Fourier transform approach to finding the 
convolution of functions. In this case, the fast Fourier transform is used to reduce the complex- 
ity of computing the convolution of two functions defined on X n (assuming X is a finite abelian 
group) from order \X\ 2n to nlog(\X\)\X\ n by first computing the fast Fourier transforms of /i and 
f 2 , multiply the resulting two functions, and then invert the result using the fast Fourier inverse 
transform. This is justified by the relation between the sum and parity indicators, and the duality 
of the equality and parity indicators under the Fourier transform, Figs\2^ (d)-(f). 

8 The inference problem 

Given an NFG G(V, E, fy) representing a set of RVs Xl. (That is, the exterior function of Q is the 
probability distribution px L ) The inference problem is to marginalize a set M C L of the RVs and 
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Figure 28: Indirect computation of the exterior function, where Fj, and fj are the cumulus and 
Fourier transforms of fj, respectively. 



to evaluate px L at some observed values (evidence) of RVs N C L, where M and N are disjoint. 
That is, the problem is to find 



Pr(xr,x n ) = y^P x L( x R' x M,XN) 



\X N =X N 1 



x M 



where R = L\(M U N). We remark that, in general, pr is not a probability distribution over the 
RVs Xr. It is rather an up to scale distribution over Xr, namely, it is the conditional distribu- 
tion Px R \x N (xr\xn = ~x~n) up to the scaling constant px n (xn)- Evidently, the complexity of the 
inference problem depends primarily on the factorization structure of px L , which is reflected by 
the graphical structure of the NFG. In order to perform the desired inference, we define Q* as the 
NFG whose exterior function is the desired function pr. That is, we define Q* as the NFG obtained 
from Q by: 1) Converting each dangling edge e € M into a regular edge by gluing a new vertex u e 
to e, where u e is associated the constant-one function. 2) Convert each dangling edge e <G N into 
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a regular edge by gluing a new vertex v e to e, where v e is associated the evaluation indicator 5 Xe ■ 
An example is shown in Fig. [29] where the original NFG is as in (a). Assuming we are interested 
in 12x 3 PXiX 2 x 3 (xi, x 2 , x 3 )\ X2= x 2 , then Q* is as in (b). 




Figure 29: Inference: (a) An example NFG Q representing pxiX 2 X 3 (xi, x 2 , X3), (b) the resulting Q* 
assuming we are interested in ^2 X3 PXiX 2 X 3 (xi, x 2 , x 3 )\ X2=X2 . 

Clearly the inference problem is encoded in Q* , and hence reduces to computing the exterior 
function of G*, which can be performed by invoking the elimination algorithm on Q* . From this 
equivalence between inference and the computation of the exterior function, one can always assume 
that the given NFG represents the desired computation, i.e., one can assume that the NFG is 
already reduced to the desired inference (.)* form. 

We remark that in a constrained model, by Proposition [21 we may assume that each interface 
node is an equality indicator. Hence, for each evaluated RV, i.e., for each i £ N C /, we may 
1) for each neighbor j £ ne(i) of i, connected to i by edge e, replace fj with fj(xE(j))\x e =x e and 
delete e, and 2) delete node i. Hence, the inference problem over constrained models is simply a 
marginalization one, and one may always assume N is empty. 

On the other hand, for a conditional function of x given y, we have ^2 X f(x,y) is a constant 
c independent of y. Hence, in a generative model Q with vertex set I L) J, for each marginalized 
RV, i.e., for each i € M C /, we may 1) absorb the constant a into one of the neighbors of i by 
replacing fj with c^/j for some j £ ne(i), 2) for each neighbor j £ ne(i) of i, connected to i by edge 
e, replace fj with ^2 Xe fj(x E ^) and delete e, and 3) delete node i. Hence, the inference problem 
over generative models is simply an evaluation one, and one may always assume M is empty. 

9 Concluding Remarks 

In this paper we presented NFGs as a new class of probabilistic graphical models. We showed that 
this framework and the transformation technique herein unify various previous models, including 
multiplicative models, convolutional factor graphs, and the known transform-domain models. We 
focused on two dual categories of NFG models, constrained and generative NFG models, and 
revealed an interesting duality in their implied dependence structure. 

We feel that approaching learning and inference problems in a transform domain is methodolog- 
ically appealing. The generic and flexible transformation technique introduced in this framework 
may potentially demonstrate great power along that direction. 

It is our hope that the idea and modelling framework presented in this paper find new applica- 
tions beyond the reach of conventional models. 
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